Device and method for processing instructions based on masked register group size information

ABSTRACT

A method and a device for processing instructions based on register group size information includes a pipelined processor, an instruction memory unit and a register file, whereas the pipelined processor includes a write-back unit and an execution unit. The device is characterized by including a controller that is adapted to receive a first register group size information and a first register identification information that define a first group of source registers associated with a first instruction; and to determine an execution related operation of the first instruction in response to the first register group size information, the first register identification information, a second register group size information and a second register identification information. The second register group size information and the second register identification information define a second group of target registers associated with a second instruction. The second instruction is provided to the pipelined processor before the first instruction.

FIELD OF THE INVENTION

The present invention relates to a device and method for processinginstructions and especially for performing feed-forward operations.

BACKGROUND OF THE INVENTION

Modern processors are required to execute complex tasks at very highspeeds. The introduction of pipelined processor architectures improvedthe performances of modern processors but also introduced some problems.In a pipelined architecture an execution of an instruction is split tomultiple stages. The PowerPC™ processors family of FreescaleSemiconductor, Inc. is an example of pipelined processors.

Pipelined processors experience stalls. A stall occurs when an executionof a current instruction depends upon information that is not ready ontime.

One method for reducing the amount of stalls and alternatively oroptionally decreases the duration of stalls is to performfeed-forwarding. Feed-forwarding usually involves retrieving informationbefore it is sent to a register file. In many cases processedinformation is both fetched to one of the pipelined units of theprocessor and in also sent (written-back) to the register file.

Various prior art processors are capable of performing simplefeed-forwarding operations. A simple feed-forward operation involves onetarget register and one source register. Some prior art processors andmethods for simple feed-forwarding operations are illustrated in U.S.Pat. No. 6,901,504 of Luick and in U.S. Pat. No. 6,145,097 of Moyer etal., both being incorporated herein by reference.

There is a need to provide an efficient method and device for performingcomplex feed forward operations.

SUMMARY OF THE PRESENT INVENTION

A method and device for processing instructions, as described in theaccompanying claims.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention will be understood and appreciated more fully fromthe following detailed description taken in conjunction with thedrawings in which:

FIG. 1 is a schematic illustration of a device according to anembodiment of the invention;

FIG. 2 illustrates an execute multiplexing unit according to anembodiment of the invention;

FIG. 3 illustrates a controller and an instruction logic according to anembodiment of the invention;

FIG. 4 illustrates an execute feed-forward unit according to anembodiment of the invention;

FIG. 5 illustrates a stall unit according to an embodiment of theinvention;

FIG. 6 illustrates an issue feed-forward unit according to an embodimentof the invention;

FIG. 7 illustrates a register file, according to an embodiment of theinvention; and

FIG. 8 illustrates a method for processing instructions, according to anembodiment of the invention.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

The following description refers to a method and system for processinginstructions, and especially for performing efficient feed-forwardingoperations even when consecutive instructions are associated withmultiple source registers and multiple target registers.

According to an embodiment of the invention instructions are associatedwith groups of registers and these groups are represented by registeridentification information and register group size information. Theregister group size information can be used to mask the registeridentification information and to perform comparisons between maskedinformation representative of groups of registers.

Conveniently complex feed-forward operations involve many targetregisters and/or multiple source registers. Instead of performing alarge number of single-register comparisons, the method and deviceperform few (and even a single) comparison between selectively maskedinformation that represent groups of registers.

FIG. 1 illustrates device 100, according to an embodiment of theinvention. Device 100 can be an integrated circuit, multiple integratedcircuits, a mobile phone, personal data accessory, media player,computer, and the like. Those of skill in the art will appreciate thatdevice 100 can include many components and units that are notillustrated in FIG. 1, as well as include fewer components or othercomponents than those that are illustrated in FIG. 1.

Device 100 includes pipelined processor 110, instruction memory unit 120and data memory unit 122. The pipelined processor 110 is connected tothe instruction memory unit 120 and to the data memory unit 122. Itfetches instructions from the instruction memory unit 120 and fetchesdata from the data memory unit 122.

The pipelined processor 110 includes a fetch unit 112, an issue unit114, an execute unit 116 and a write-back unit 118. These units areconnected in a substantial serial manner to each other, although thewrite-back unit 118 can provide information to the execute unit 116 andthe issue unit 114. It is noted that the issue unit 114 is also referredto as a decode unit.

Pipelined processor 110 also includes a controller 140, an instructionlogic 150, a register file 130, an issue multiplexing unit 180 and anexecute multiplexing unit 190. It is noted that at least some of theseunits can be located outside the pipelined processor 110.

The fetch unit 112 is connected to instruction memory unit 120 and tothe issue unit 114. The issue unit is further connected to the datamemory unit 122, to the instruction logic 150 and to an output of theissue multiplexing unit 180. The execute unit 116 is connected betweenthe issue unit 114 and the write-back unit 118. It is further connectedto the output of the execute multiplexing unit 190 and to the datamemory unit 122.

The issue multiplexing unit 180 and the execute multiplexing unit 190are controlled by controller 140. These multiplexing units 180 and 190select whether to provide information from the register file 130 or(during a feed-forward operation) from the write-back unit 118.

The register file 130 includes multiple registers, such as registersR1-R8 131-138 of FIG. 7. Some of the instructions that are executed bythe pipelined processor can be associated with groups of registers, andespecially with groups of registers that includes a sequence ofconsecutive registers.

For example, a LOAD.B8 instruction or a STORE.B8 instruction involveeight bytes, whereas each register out of R1-R8 131-138 is four byteslong. Thus, each of these instructions is associated with a group ofsource registers that includes a pair of registers.

Conveniently, each group of registers is represented by register groupsize information and by register identification information. Theregister identification information can indicate the address of thefirst register of the group.

For example, a LOAD.B8 R1, R5 instruction means that four bytes that arepointed by R5 135 and the next four bytes (pointed by the value withinR5 plus four) should be loaded to registers R1 131 and R2 132accordingly. This instruction includes a register group size field(indicates that two registers belong to the group), and registeridentification information that identifies R1 131.

Yet for another example, a STORE Q R1 instruction means that the leastsignificant sixteen bits of registers R1-R4 131-134 should be send to anexternal memory.

The controller 140 can determine whether to perform an execution relatedoperation such as a stall or a feed-forward operation based upon therelationship between a group of source registers associated with a firstinstruction and a group of target registers associated with a secondinstruction that was provided to the pipelined processor 110 before thefirst instruction.

Conveniently, the controller 140 is adapted to: (i) receive a secondregister group size information and a second register identificationinformation that define a second group of source registers associatedwith a second instruction, (ii) receive a first register group sizeinformation and a first register identification information that definea first group of source registers associated with a first instruction,and to (iii) determine an execution related operation of the firstinstruction in response to the first register group size information,the first register identification information, the second register groupsize information and the second register identification information.

Controller 140 enables to quickly determine whether to perform a stalloperation or a feed forward operation. The determination can includecomparing between masked information that represent groups of registers.If, for example a second instruction is associated with two consecutivesource registers then the least significant bit of the registeridentification information is masked and the masked informationrepresents both source registers. If, for example, a first instructionis associated with a group of four source registers then the two leastsignificant bits of the first register identification information aremasked.

The masking operation, as well as an arrangement of registers such thatconsecutive registers are accessed by consecutive register addresses,enable to reduce the number of comparisons, even if consecutiveinstructions are associated with many registers.

The write-back unit 118 is connected to the register file 130, to theissue multiplexing unit 180 and to the execute multiplexing unit 190 viaa write-back bus 119. Conveniently, the write-back bus 119 includes aload write-back bus 119_1 and an ALU write-back bus 119_2. Buses 119_1and 119_2 are illustrated in FIG. 2 and in FIG. 7.

The load write-back bus 119_1 conveys the results of load operations.The ALU write-back bus 119_2 conveys the results of ALU operations. Itis noted that the execute unit 116 is connected to the write-back unit118 via two buses that correspond to buses 119_1 and 119_2. Forsimplicity of explanation a single write-back bus 119 is illustrated inFIG. 1.

FIG. 2 illustrates an execute multiplexing unit 190 according to anembodiment of the invention.

It is assumed that the execute unit 116 includes four operand inputs116(1)-116(4). Each operand input can receive operands from the fileregister (via file register bus 139) or from the write-back unit 118 (ifa feed-forward operation occurs).

Each multiplexer out of first till fourth execute multiplexers 191-194is connected to the register file bus 139 and to the write-back bus 119.It is noted that each bus can be connected to multiple inputs of eachmultiplexer, thus allowing selective retrieval of information that isshorter than the bus width.

The first execute multiplexer 191 is controlled by a first execute muxcontrol signal 201. The second execute multiplexer 192 is controlled bya second execute mux control signal 202. The third execute multiplexer193 is controlled by a third execute mux control signal 203. The fourthexecute multiplexer 194 is controlled by a fourth execute mux controlsignal 204. The control signals are generated by the executefeed-forward unit 144 of controller 140.

Conveniently, the inventors used multiplexers that had six inputs. Afirst input (32-bit wide) received the file register bus 139. A secondinput (32-bit wide) received ALU write-back register 119_2 (32-bitwide). Four additional inputs receive four groups (of sixteen lineseach) of lines of load write-back bus 119_1. For simplicity ofexplanation fewer inputs were illustrated.

FIG. 3 illustrates a controller 140 and an instruction logic 150,according to an embodiment of the invention.

The controller 140 includes a stall unit 142, an execute feed-forwardunit 144 and an issue feed-forward unit 146. The stall unit 144determines whether to stall an execution of a currently receivedinstruction. The execute feed-forward unit 144 determines whether theexecute unit 116 should fetch operands from the register file or fromthe write-back bus 119. It sends four control signals (first till fourthexecute mux control signals 201-204) to first till fourth executemultiplexers 191-194). The issue feed-forward unit 146 determineswhether the issue unit 114 should fetch operands from the register fileor from the write-back bus 119. It sends four control signals (firsttill fourth issue mux control signals 211-214 to first till fourth issuemultiplexers that belong to issue multiplexing unit 180).

Each of these units (142-146) receives currently received registeridentification information and currently received register group sizeinformation. In addition these units receive information from previouslyreceived instruction. These units selectively mask the registeridentification information to provide masked information and comparebetween the masked information. It is noted that if an instruction isassociated with one source register or one target register then theregister identification information is not masked.

The instruction logic 150 provides to the controller 140 register groupsize information and register identification information in accordanceto an execution process of instructions associated with the registergroup size information and register identification information. Theinstruction logic 150 includes multiple delay units that form two delaypaths 156 and 157, whereas the length of each delay path is responsiveto an execution period of a certain type of instruction.

For example, instruction logic 150 includes a first delay path 156 thatemulates the execution of short duration instructions. The first delaypath 156 includes first and second delay units 151 and 152 thatrepresent the execution stage and write-back stage of a shortinstruction such as an instruction that is executed by an arithmeticlogic unit.

The second delay path 157 includes a third, fourth and fifth delay units153-155 that represent two load stages (load address and load data) anda write-back stage of a long instruction.

A current instruction (also referred to as a first instruction) 220 orat least a portion of said instruction (such as first registeridentification information 231 and first register group size information232) is provided to short instruction validation logic 161 and to longinstruction validation logic 162. Each of these logics determines,according to the content of the first instruction, whether it is a longor a short instruction. If it is a long instruction then the longinstruction validation logic 162 associates a valid flag with thisinstruction, and the short instruction validation logic 161 associatesan invalid flag with this instruction. If it is a short instruction thenthe long instruction validation logic 162 associates an invalid flagwith this instruction, and the short instruction validation logic 161associates a valid flag with this instruction. Then the instruction isprovided to delay paths 156 and 157.

It is noted that some instructions can propagate (associated with validflags) over more than one delay path.

It is further noted that a switching logic can be used instead of twologics 161 and 162. It is further noted that either one of logics 161and 162 can also prevent the propagation of an invalid instructioninformation over the delay paths.

It is further noted that if there are more than two instruction typesthen additional delay paths can be provided. An instruction type ischaracterized by the duration of its execution.

Those of skill in the art will appreciate that pipelines that havedifferent lengths than four or five cycles can be emulated by delaypaths that have different lengths than those of delay paths 156 and 157.

The first delay unit 151 provides information that is delayed by oneclock cycle. For convenience of explanation it is referred to as secondregister group size information 234 and second register identificationinformation 233. This information is valid if during a previous cyclethe received instruction was a short instruction.

The second delay unit 152 provides information that is delayed by twoclock cycles. For convenience of explanation it is referred to as fourthregister group size information 247 and fourth register identificationinformation 246. This information is valid if two cycles ago thereceived instruction was a short instruction.

The third delay unit 153 provides information that is delayed by oneclock cycle. For convenience of explanation it is referred to as sixthregister group size information 242 and sixth register identificationinformation 241. This information is valid if during a previous cyclethe received instruction was a long instruction.

The fourth delay unit 154 provides information that is delayed by twoclock cycles. For convenience of explanation it is referred to as thirdregister group size information 237 and third register identificationinformation 236. This information is valid if two clock cycles ago thereceived instruction was a long instruction.

The fifth delay unit 155 provides information that is delayed by threeclock cycles. For convenience of explanation it is referred to as fifthregister group size information 252 and fifth register identificationinformation 251. This information is valid if three clock cycles ago thereceived instruction was a long instruction.

It is noted that the first instruction 220 can be provided after adecoding sequence occurs.

The execute feed-forward unit 144 is adapted to determine an executionof a feed-forward operation to the execute unit 116. It receives thefirst, second and third register group size information 232, 234 and 237and uses it to mask the first, second and third register identificationinformation 231, 233 and 236 to provide first, second and third maskedinformation 238, 235 and 239.

If the first masked information 238 equals a valid information out ofthe second masked information 235 and the third masked information 239the execute feed-forward unit 144 sends first till fourth executefeed-forward mux control signals 201-204 that instruct first till fourthexecute multiplexers 191-194 to perform a feed-forward operation.

Referring to FIG. 4, execute feed-forward unit 144 includes a firstexecute feed-forward mask unit 144(1) that generates the first maskedinformation 238, a second execute feed-forward mask unit 144(2) thatgenerates the second masked information 235 and a third executefeed-forward mask unit 144(3) that generates the third maskedinformation 239. The execute feed-forward comparator 144(4) comparesbetween the masked information and determines the value of first tillfourth execute feed-forward mux control signals 201-204.

The stall unit 142 determines whether to stall the execution of thefirst instruction. It receives the first, second, third and sixthregister group size information 232, 234, 237 and 242 and uses it tomask the first, second, third and sixth register identificationinformation 231, 233, 236 and 241 to provide first, second, third andsixth masked information 238, 235, 239 and 243.

If the first masked information 238 equals (i) a valid information outof the second masked information 235 and third masked information 239,and (ii) the masked information 243 then the stall unit 142 sends astall indication 251.

Referring to FIG. 5, stall unit 152 includes a first stall mask unit142(1) that generates the first masked information 238, a second stallmask unit 142(2) that generates the second masked information 235, athird stall mask unit 142(3) that generates the third masked information239, and a fourth stall mask unit 142(4) that generated the fourthmasked information 243. The stall comparator 142(5) compares between themasked information and determines whether to generate a stall indication251.

The issue feed-forward unit 146 is adapted to determine an execution ofa feed-forward operation to the issue unit 114. It receives the first,fourth and fifth register group size information 232, 247 and 252 anduses it to mask the first, fourth and fifth register identificationinformation 231, 246 and 251 to provide first, fourth and fifth maskedinformation 238, 248 and 253.

If the first masked information 238 equals a valid information out ofthe fourth masked information 248 and the fifth masked information 253the issue feed-forward unit 146 sends first till fourth issuefeed-forward mux control signals 211-214 that instruct first till fourthissue multiplexers of the issue multiplexing unit 180 to perform afeed-forward operation.

Referring to FIG. 6, issue feed-forward unit 146 includes a first issuefeed-forward mask unit 146(1) that generates the first maskedinformation 238, a second issue feed-forward mask unit 146(2) thatgenerates the fourth masked information 248 and a third issuefeed-forward mask unit 146(3) that generates the fifth maskedinformation 253. The issue feed-forward comparator 146(4) comparesbetween the masked information and determines the value of first tillfourth issue feed-forward mux control signals 211-214.

FIG. 7 illustrates a register file 130, according to an embodiment ofthe invention.

Register file 130 includes eight registers R1-R8 131-138, and a registerfile controller 130′. The register file controller 130′ controls theaccess to the registers. Conveniently, the eight registers haveconsecutive addresses, starting by R1 131.

Register file 130 includes eight registers that can be addressed byusing three address (or register identification information) bits. It isnoted that register files that have more registers should be addressedby more than three address bits.

It is assumed that each register is four bytes long and that the oddregisters R1 131, R3 133, R5 135 and R7 137 are connected to the mostsignificant lines of the load write-back bus 119_1 (load write-back buslines 119_1(0)-119_1(31)) and to the most significant lines of theregister file bus 139 (register file bus lines 139(0)-139(31)). The evenregisters R2 132, R4 134, R6 136 and R8 138 are connected to the leastsignificant lines of the load write-back bus 119_1 (load write-back buslines 119_1(32)-119_1(63)), to the most significant lines of the loadwrite-back bus 119_1 (load write-back bus lines 119_1(0)-119_1(31)), andto the least significant lines of the register file bus 139 (registerfile bus lines 139(32)-139(63)).

All registers are connected to the ALU write-back bus 119_2.

R1 131 is addressed by register identification information 000, R2 132is addressed by register identification information 001, R3 133 isaddressed by register identification information 010, R4 134 isaddressed by register identification information 011, R5 135 isaddressed by register identification information 100, R6 136 isaddressed by register identification information 101, R7 137 isaddressed by register identification information 110 and R8 138 isaddressed by register identification information 111.

When an instruction is associated with R1-R4 131-134 or R5-R8 135-138then the two least significant bits of the register identificationinformation should be masked when controller 140 determines whether toperform a stall operation or a feed-forward operation.

When an instruction is associated with R1-R2, R3-R4, R5-R6 or R7-R8 thenthe least significant bit of the register identification informationshould be masked when controller 140 determines whether to perform astall operation or a feed-forward operation.

Those of skill in the art will appreciate that if the registeridentification information is assigned in a different manner then themasking operation can be adapted accordingly.

It is noted that this masking scheme imposes various limitations uponmultiple register instructions (for example addressing four registersthat can start by R1 or R5), but these limitations can be overcome byusing more complex masking schemes. Various prior art masking schemescan be applied to provide more flexible usage of registers.

TABLE 1 provides various examples of the signals provided by the issuefeed-forward unit 146. Its first column includes a previous instruction,its second column illustrates the current instruction and the thirdcolumn illustrates the first till fourth issue mux control signals211-214. A default signal (“d”) means that no feed-forwarding isrequired and that the operand is retrieved from register file 130. Othervalues indicate the lines of the write-back line 119 from which toretrieve the information. Conveniently, each multiplexer (such asmultiplexers 191-194) includes at least one input for each bus out ofwrite-back bus 119 and register file bus 139. Multiple inputs enable toselect one group of bus lines out of multiple groups of bus lines. If,for example each group of two bytes long then a multiplexer can includeup to four inputs per the eight-byte wide buses.

Symbol “R” (without any following number) denotes any register from theregister file. Symbol “Q” denotes a QUAD registers operation. Symbol“B8” denotes a double register operation. Symbol “d” indicates aretrieval of information from register file 130.

TABLE 1 First till fourth Previous Current issue mux control instructioninstruction signals 211-214 LOAD R1, R STORE R1, R 000, d, d, d LOAD R1,R STORE.B8 R1, R 000, d, d, d LOAD R1, R STORE.Q R1, R 000, d, d, d LOADR2, R STORE R1, R 000, d, d, d LOAD R2, R STORE.B8 R1, R d, 000, d, dLOAD R2, R STORE.Q R1, R d, 000, d, d LOAD R3, R STORE R3, R 000, d, d,d LOAD R3, R STORE.B8 R3, R 000, d, d, d LOAD R3, R STORE.Q R1, R d, d,000, d LOAD R4, R STORE R4, R 000, d, d, d LOAD R4, R STORE.B8 R3, R d,000, d, d LOAD R4, R STORE.Q R1, R d, d, d, 000 LOAD.B8 R1, R STORE R1,R 00S[4], d, d, d LOAD.B8 R1, R STORE R2, R 00S[4], d, d, d LOAD.B8 R1,R STORE.B8 R1, R 00S[4], d, d, d LOAD.B8 R1, R STORE.Q R1, R 000, 001,d, d LOAD.B8 R3, R STORE R3, R 00S[4], d, d, d LOAD.B8 R3, R STORE R4, R00S[4], d, d, d LOAD.B8 R3, R STORE.B8 R3, R 00S[4], 001, d, d LOAD.B8R3, R STORE.Q R1, R d, d, 000, 001 LOAD.Q R1, R STORE R1, R 100, d, d, dLOAD.Q R1, R STORE R2, R 101, d, d, d LOAD.Q R1, R STORE R3R 110, d, d,d LOAD.Q R1, R STORE R4, R 111, d, d, d LOAD.Q R1, R STORE.B8 R1, R 100,101, d, d LOAD.Q R1, R STORE.B8 R3, R 110, 111, d, d LOAD.Q R1, RSTORE.Q R1, R 100, 101, 110, 111

TABLE 2 illustrates the execution stages of an exemplary sequence ofinstructions: (I1) LOAD.Q R5,R4; (I2) SUB R1, R1; (I3) ADD R2, R2 and(I4) STORE.B8 R7, R3.

Each column of TABLE 2 illustrates one clock cycle. It is noted that atclk5 an issue feed-forward operation occurs. Instruction I4 requires thecontent of registers R7 and R8. The content is ready at the end of clk4.The feed-forward operation of both registers occurs in parallel to thewrite-back to the register file 130.

TABLE 2 Clk1 Clk2 Clk3 Clk4 Clk5 Clk6 Clk7 I1 I1 I1 load I1 load I1fetch issue address data write- back I2 I2 I2 I2 fetch issue executewrite- back I3 I3 I3 I3 fetch issue execute write- back I4 I4 I4 I4fetch issue execute write- back Issue feed- forward of R7 and R8

TABLE 3 illustrates the execution stages of an exemplary sequence ofinstructions: (I1) SUB R1, R2; (I2) ADD R7, R8 and (I3) ADD R4, R1.

Each column of TABLE 3 illustrates one clock cycle. It is noted that atclk4 an issue feed-forward operation occurs. Instruction I3 requires thecontent of register R1. The content is ready at the end of clk3. Thefeed-forward operation occurs in parallel to the write-back to theregister file 130.

TABLE 3 Clk1 Clk2 Clk3 Clk4 Clk5 Clk6 I1 I1 I1 I1 fetch issue executewrite- back I2 I2 issue I2 I2 fetch execute write- back I3 fetch I3issue I3 I3 execute write- back Issue feed- forward of R1

TABLE 4 illustrates the execution stages of an exemplary sequence ofinstructions: (I1) LOAD.B8 R1, R5 and (I2) LOAD R7, R2.

Each column of TABLE 4 illustrates one clock cycle. It is noted that atclk3 and at clk4 a stall operation occurs as I1 does not update thecontent of its target registers R1 and R2 till the end of ck4. At clk5an issue feed-forward operation occurs (of R2 that is needed as addressof the second LOAD). The feed-forward operation occurs in parallel tothe write-back to the register file 130.

TABLE 4 Clk1 Clk2 Clk3 Clk4 Clk5 Clk6 Clk7 I1 I1 I1 load I1 load I1fetch issue address data write- back I2 Stall stall I2 I2 I2 fetch issueexecute write- back Stall- Stall- Issue R1 is R1 is feed- not notforward ready ready of R1

TABLE 5 illustrates the execution stages of an exemplary sequence ofinstructions: (I1) LOAD.Q R5,R20; (I2) SUB R3, R4; (I3) ADD R1, R8.

Each column of TABLE 5 illustrates one clock cycle. The content of R8 isneeded by I3. The content is ready at the end of clk4. At clk5 anexecute feed-forward operation occurs (of R8). The feed-forwardoperation occurs in parallel to the write-back to the register file 130.

TABLE 5 Clk1 Clk2 Clk3 Clk4 Clk5 Clk6 I1 I1 I1 load I1 load I1 fetchissue address data write- back I2 I2 issue I2 execute I2 fetch write-back I3 fetch I3 issue I3 I3 execute write-back Execute feed- forward ofR8

FIG. 8 illustrates a method 300 for processing instructions, accordingto an embodiment of the invention.

For convenience of explanation the following description refers to anexecution of two instructions. Those of skill in the art will appreciatethat the number of instructions that can affect each other can differthan two, thus the determination stage is responsive to more than twoinstructions. In addition, various stages are explained in reference todevice 100. This reference provides exemplary non-limiting examples ofthe execution of method 300.

Method 300 starts by stage 310 of receiving a second instruction.

Conveniently, the receiving is followed by initializing a pipelinedexecution process of the second instruction or continuing the pipelinedexecution process of the second instruction. It is noted that thereception can be regarded as a part of the pipelined execution session.The pipelined execution process can include a fetching stage, a decoding(or issue) stage, one or more load stages and a write-back stage, and anexecution stage, some of these stages or a combination of more stages.The execution stage can be executed by an arithmetic logic unit but thisis not necessarily so. The one or more load stages are characteristic ofload or store instructions, but this is not necessarily so.

Stage 310 can include fetching the second instruction (for examplefetching the instruction from instruction memory unit 120 by fetch unit112) and providing the fetched instruction to an issue unit such asissue unit 114 of FIG. 1.

Stage 310 is followed by stages 320 and 340. Stage 320 includesreceiving a first instruction. The receiving is followed by initializinga pipelined execution process of the second instruction or continuingthe pipelined execution process of the first instruction.

Stage 320 can include fetching the first instruction and even providingthe fetched instruction to an issue unit such as issue 114 of FIG. 1.

It is noted that when stage 320 occurs the second instruction can beprocessed by the issue unit 114, the execute unit 116, and the like. Ifthe results (for example processed operands) of the execution of thesecond instruction were already sent to the write-back unit 118 orwritten to the register file 130 then the execution of the secondinstruction will not cause the method to stall or to perform afeed-forward operation.

Stage 340 includes providing to a controller second register group sizeinformation and second register identification information that define asecond group of target registers associated with a second instruction.

Referring to the example set forth in previous drawings, instructionlogic 150 provides to controller 140 information via its delay units151-155.

Conveniently, stage 340 of providing to the controller second registergroup size information and second register identification informationincludes timing the provision of this information in accordance to anexecution process of the first instruction. The timing can be dictatedby the delay units 151-155.

Conveniently, stage 340 is preceded by stage 390 of selecting a delaypath out of multiple delay paths associated with different instructiontypes characterized by different execution periods. Stage 340 is alsopreceded by stage 394 of delaying the second register size informationin response to a type of the second instruction. Stage 394 follows stage390.

Referring to FIG. 3, the selection is made by the short and longinstruction validation logics 161 and 162 and a valid instruction issent to the first or second delay paths 156 and 157.

Stage 320 is followed by stage 350 of providing to the controller firstregister group size information and first register identificationinformation that define a first group of source registers associatedwith a first instruction. Referring to the example set forth in FIG. 3,this information (referred to as first instruction 220) is provided tocontroller 140.

Stage 350 is followed by stage 360 of determining, by the controller, anexecution related operation of the first instruction in response to thefirst register group size information, the first register identificationinformation, the second register group size information and the secondregister identification information.

Referring to the example set forth in FIG. 3, controller 140 includesthree units—stall unit 142, execute feed-forward unit 144 and issuefeed-forward unit 146. These units determine when to perform stalloperations and feed-forward operations.

Conveniently, stage 360 includes determining an identity of write-backbus lines that convey valid feed-forward information. Referring to FIG.7, the write-back bus 119 and the register file bus 139 includesixty-four lines, as they are eight bytes wide. In various cases theoperands are smaller. For example, a QUAD operation can operate on theleast significant byte of each register. Thus, according to therequested operation and to the identity of the involved registers thecontroller 140 may also indicate which lines of the relevant bus shouldbe read.

Conveniently, stage 360 includes determining a relationship between thesecond group of target registers and the first group of sourceregisters. If there is an overlap between these groups of registers thena stall operation and/or a feed-forward operation may be required.

Conveniently, stage 360 includes stage 362 of masking the first registeridentification information by the first register group size to provide afirst masked instruction register identifier. Stage 362 is followed bystage 364 of masking the second register identification information bythe second register group size to provide a second masked instructionregister identifier. Stage 364 is followed by stage 366 of comparingbetween the second and first masked information.

Stage 360 is followed by stage 370 of selectively performing theexecution related operation. Conveniently, stage 370 includes stallingan execution of the first instruction or performing a feed-forwardoperation. It is noted that stage 370 may include continuing or evenfinishing the pipelined execution process of the first instructionwithout stalling it or without performing a feed-forward operation.

Stage 370 can include selectively performing a feed-forward operation toa decoding unit of a pipelined processor and selectively performing afeed-forward operation to an execution unit of the pipelined processor.Referring to the example set froth in FIG. 3, the issue feed-forwardunit 146 controls feed-forward operation to the issue unit 114. Theexecute feed-forward unit 144 controls a feed-forward operation to theexecute unit 116.

Conveniently, stage 360 of determining is responsive to additionalinformation relating to more than two instructions.

For example, the method can include stage 380 of providing to thecontroller a third register group size information and a third registeridentification information that define a third group of target registersassociated with a third instruction. In this case stage 360 can befurther responsive to the third register group size information and tothe third register identification information.

Referring to the example set forth in FIG. 3, each unit out of stallunit 142, execute feed-forward unit 144 and issue feed-forward unit 146compares between information associated with the first instruction andvalid information that can be delayed by one or more cycles. Theinformation can be provided from two delay paths.

Variations, modifications, and other implementations of what isdescribed herein will occur to those of ordinary skill in the artwithout departing from the spirit and the scope of the invention asclaimed. Accordingly, the invention is to be defined not by thepreceding illustrative description but instead by the spirit and scopeof the following claims.

1. A device comprising: a pipelined processor; an instruction memoryunit; and a register file; wherein the pipelined processor comprises: awrite-back unit; and an execution unit; a controller that is adapted toreceive a first register group size information and a first registeridentification information that define a first group of source registersassociated with a first instruction, to mask the first registeridentification information, and to determine an execution relatedoperation of the first instruction in response to the first registergroup size information, the masked first register identificationinformation, a second register group size information and a secondregister identification information, wherein a total number of leastsignificant bits masked in the masked first register identificationinformation is less than a total number of source registers associatedwith the first instruction; wherein the second register group sizeinformation and the second register identification information define asecond group of target registers associated with a second instruction;and wherein the second instruction is provided to the pipelinedprocessor before the first instruction.
 2. The device according to claim1 wherein the execution related operation is a stall operation.
 3. Thedevice according to claim 1, wherein the execution related operationcomprises a receiving at least one result of an execution of the secondinstruction from the write-back unit.
 4. The device according to claim1, wherein the controller is adapted to determine the execution relatedoperation in response to a relationship between the second group oftarget registers and the first group of source registers.
 5. The deviceaccording to claim 1, wherein the controller is adapted to mask thesecond register identification information by the second register groupsize information to provide a second masked instruction registeridentifier; and to compare between the second and first masked registeridentification information.
 6. The device according to claim 1, whereinthe controller is further adapted to receive a third register group sizeinformation and a third register identification information that definea third group of target registers associated with a third instruction,and to determine an execution related operation of the first instructionin response to the first and second register group size information, thefirst and second register identification information, the third registergroup size information and the third register identificationinformation; wherein the third register group size information and thethird register identification information define a third group of targetregisters associated with a third instruction; and wherein the thirdinstruction is provided to the pipelined processor before the firstinstruction.
 7. The device according to claim 1, wherein the controlleris coupled to an instruction logic that provides to the controllerregister group size information and register identification informationin accordance to an execution process of instructions associated withthe register group size information and the register identificationinformation.
 8. The device according to claim 1, wherein the instructionlogic comprises multiple delay paths, wherein a length of each delaypath is responsive to an execution period of a certain type ofinstruction; and wherein the instruction logic provides to thecontroller delayed register group size information and delayed registeridentification information.
 9. The device according to claim 1, whereinthe controller comprises a stall unit, and a feed-forward unit; whereinthe stall unit is adapted to determine an execution of a stalloperation; and wherein the feed-forward unit is adapted to determine anexecution of a feed-forward operation.
 10. The device according to claim1, wherein the pipelined processor further comprises a decoding unitandwherein the controller controls a provision of information to theexecution unit and to the decoding unit.
 11. The device according toclaim 1, wherein the pipelined processor is adapted to determine anidentity of write-back bus lines that convey valid feed-forwardinformation.
 12. A method for processing instructions, the methodcomprises: receiving a second instruction and initializing a pipelinedexecution process of the second instruction; receiving a firstinstruction and initializing a pipelined execution process of the firstinstruction; providing to a controller a second register group sizeinformation and a second register identification information that definea second group of target registers associated with the secondinstruction; providing to the controller a first register group sizeinformation and a first register identification information that definea first group of source registers associated with the first instruction;masking, by the controller, the first register identificationinformation, wherein a total number of least significant bits masked inthe masked first register identification information is less than atotal number of source registers associated with the first instruction;determining, by the controller, an execution related operation of thefirst instruction in response to the first register group sizeinformation, the masked first register identification information, thesecond register group size information and the second registeridentification information; and performing the execution relatedoperation; wherein the pipelined execution process of the firstinstruction comprises the execution related operation.
 13. The methodaccording to claim 12, wherein the executing comprises performing afeed-forward operation.
 14. The method according to claim 12, whereinthe determining comprises determining a relationship between the secondgroup of target registers and the first group of source registers. 15.The method according to claim 12, wherein the determining comprises:masking the second register identification information by the secondregister group size information to provide a second masked registeridentifier; and comparing between the second and first masked registeridentification information.
 16. The method according to claim 12,further comprising providing to the controller a third register groupsize information and a third register identification information thatdefine a third group of target registers associated with a thirdinstruction; and wherein the determining is further responsive to thethird register group size information and to the third registeridentification information.
 17. The method according to claim 12,wherein the providing to the controller the second register group sizeinformation comprises timing the providing in accordance to an executionprocess of the first instruction.
 18. The method according to claim 12,wherein the providing to the controller the second register group sizeinformation is preceded by selecting a delay path out of multiple delaypaths associated with different instruction types characterized bydifferent execution periods; and delaying the second register group sizeinformation in response to a type of the second instruction.
 19. Themethod according to claim 12, wherein the performing comprisesperforming a stall operation by a stall unit, and performing afeed-forward operation by a feed-forward unit.
 20. The method accordingto claim 12, wherein the performing comprises selectively performingfeed-forwarding to a decoding unit of a pipelined processor andselectively performing feed-forwarding to an execution unit of thepipelined processor.
 21. The method according to claim 12, wherein thedetermining comprises determining an identity of write-back bus linesthat convey valid feed-forward information.
 22. A device comprising: apipelined processor; an instruction memory unit; and a register file;wherein the pipelined processor comprises: a write-back unit; and anexecution unit; a controller that is adapted to receive a first registergroup size information and a first register identification informationthat define a first group of source registers associated with a firstinstruction, and to determine an execution related operation of thefirst instruction in response to the first register group sizeinformation, the first register identification information, a secondregister group size information and a second register identificationinformation, and the controller is adapted to mask the first registeridentification information by the first register group size to provide afirst masked instruction register identifier, to mask the secondregister identification information by the second register group size toprovide a second masked instruction register identifier, and to comparebetween the second masked instruction register identifier and the firstmasked instruction register identifier, wherein a total number of leastsignificant bits masked in the first masked instruction registeridentifier is less than a total number of source registers associatedwith the first instruction, and a total number of least significant bitsmasked in the second masked instruction register identifier is less thana total number of target registers associated with the secondinstruction; wherein the second register group size information and thesecond register identification information define a second group oftarget registers associated with a second instruction; and wherein thesecond instruction is provided to the pipelined processor before thefirst instruction.